Sensors 2011, 11, 4372-4384; doi:10.3390/s 110404372 



OPEN ACCESS 



sensors 

ISSN 1424-8220 

www.mdpi.com/journal/sensors 

Article 

Boosting-Based On-Road Obstacle Sensing Using Discriminative 
Weak Classifiers 

Shyam Prasad Adhikari \ Hyeon-Joong Yoo 2 '* and Hyongsuk Kim 1 

1 Division of Electronics and Information Engineering, Chonbuk National University, Jeonju 561-756, 
Korea; E-Mails: shyam.rvision@hotmail.com (S.P.A.); hskim@jbnu.ac.kr (H.K.) 
Department of IT Engineering, Sangmyung University, Chonan 330-720, Korea 

* Author to whom correspondence should be addressed; E-Mail: yoohj@smu.ac.kr; 
Tel.: +82-10-3480-0809; Fax: +82-41-550-5355. 

Received: 14 January 2011; in revised form: 20 March 2011 /Accepted: 12 April 2011 / 
Published: 14 April 2011 

Abstract: This paper proposes an extension of the weak classifiers derived from the 
Haar-like features for their use in the Viola- Jones object detection system. These weak 
classifiers differ from the traditional single threshold ones, in that no specific threshold is 
needed and these classifiers give a more general solution to the non-trivial task of finding 
thresholds for the Haar-like features. The proposed quadratic discriminant analysis based 
extension prominently improves the ability of the weak classifiers to discriminate objects 
and non-objects. The proposed weak classifiers were evaluated by boosting a single stage 
classifier to detect rear of car. The experiments demonstrate that the object detector based 
on the proposed weak classifiers yields higher classification performance with less number 
of weak classifiers than the detector built with traditional single threshold weak classifiers. 
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1. Introduction 

In pattern recognition, object detection generally is a two-class classification problem with two 
essential issues of feature selection and classifier design based on the selected features. Classifiers 
based on Haar-like features [1] have been successfully used for object detection. Viola and Jones [2] 
proposed an object detection framework where these Haar-like features are selected and classifier is 
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trained using AdaBoost [3]. This approach has become a popular framework for object detection and 
several extensions of this framework have been proposed. One of the extensions is the improvement in 
the boosting algorithm. Modified versions of AdaBoost such as Real AdaBoost [4], FloatBoost [5] and 
KLBoosting [6] are available. Real AdaBoost is used for multi-view face detection [7]. In addition to 
face detection [5] Float Boost is also applied to hand shape detection [8]. The other extension of the 
original framework is to use an extended set of Haar-like features so that different image patterns can 
be evaluated. In addition to the basic feature set of Figure 1(a), an extended set of Haar-like features as 
shown in Figure l(b,c) are introduced in [9,10], and [11]. Mita et al. [12] have selected multiple 
co-occurring linear weak classifiers to form a more efficient classifier. Boosting in a hierarchical 
feature space where the local Haar-like features are replaced by global features derived from PCA in 
later stages of boosting is introduced in [13]. An extension of Haar-like features in which different 
weights, determined by techniques like Brute force search, Genetic algorithms and Fischer's linear 
discriminant analysis, are assigned to the rectangles of Haar-like features is proposed in [14]. Hybrid 
features composed of gradient features, Edgelet features and Haar-like features are used in [15] for 
pedestrian detection. 

Figure 1. Examples of the Haar-like feature set. (a) Basic feature set which consists of two 
adjacent rectangles, (b) and (c) Extended feature sets which consist of different number 
and arrangement of rectangles, respectively. 



(a) (b) (c) 

The selection of threshold for the Haar-like features is not a trivial task and has not been explained 
in detail in [2]. The weak classifiers based on single threshold Haar-like features are sub-optimal and 
not efficient for discriminating object and non-object. At later stages of the cascade these single 
threshold Haar-like features become too weak for discrimination and make boosting ineffective [13]. 
In this paper, we propose a different set of weak classifiers for boosting that achieves higher 
classification accuracy with less number of weak classifiers. Unlike in [2], the proposed weak 
classifiers do not require explicit thresholds be calculated for the Haar-like features and present a more 
general solution to the threshold selection problem. The proposed weak classifiers are equally efficient 
for discrimination at later stages of boosting also. 

The rest of the paper is organized as follows: Section 2 describes the AdaBoost learning of the 
Haar-like features. Section 3 presents the proposed method for realizing efficient weak classifiers. 
Experimental setup and results are presented in Section 4, followed by concluding remarks in Section 5. 

2. Boosting of Weak Classifiers 

This section describes the conventional weak classifiers and AdaBoost learning algorithm for 
constructing a strong classifier by selecting the weak classifiers. 
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2.1. Boosting of Weak Classifiers 

The Haar-like features have scalar values that represent the difference in the sum of intensities 
between the adjacent rectangular regions. To capture the ad hoc knowledge about the domain, these 
features are evaluated at different positions and with different sizes exhaustively according to the base 
resolution of the classifier. For example, when the classifier resolution is 24 x 18 pixels, 91,620 features 
are generated from the five features in Figure l(a,b). Each feature is evaluated on all the training 
samples and the probability density for each of the object and non-object class is calculated as shown 
in Figure 2. In [2], a single threshold that separates these two distributions is selected for each feature. 
These features along with their respective thresholds and polarity form the weak classifiers for the 
learning algorithm. 

Figure 2. Example of feature value distributions. In [2] a single threshold that separates the 
two distributions is used. 
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A weak classifier can be mathematically described as: 

h(x,f,p,0) = < . (1) 

[-1 otherwise 

where x is the base resolution of the classifier, / the Haar-like feature, 6 the threshold for the feature 
and p the polarity indicating the direction of inequality. The choice of optimal threshold for the 
features is not stated clearly in [2] and shows to be a non-trivial task. 

2.2. AdaBoost 

AdaBoost is a machine learning boosting algorithm that constructs a strong classifier by combining 
a set of weak classifiers. A small number of discriminative weak classifiers are selected by updating 
the sample distribution. The prediction of the strong classifier is produced through a weighted majority 
voting of the weak classifiers. Pseudo code of a variant of AdaBoost used in the implementation is 
given in Algorithm 1 . 
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Algorithm 1. Pseudo code of Discrete AdaBoost. 

-Given example images (x 1 ,y 1 ),...,(x n ,y n ) where y i = -1,1 for negative and positive examples. 

-Initialize weights w, , = — - — , where / and m are the number of positives and negatives respectively 

/ + //; 

-For? = l,...,r: 

1) Normalize the weights, 

w, . <— ■ 



t,i n 



2X/ 

7=1 

2) Select the best weak classifier with respect to the weighted error: 

£ x = miKf.pM Z w i 1 h( - x t ' /' P> ~ ^ 1 

i 

3) Define h t (x) = h(x, f t ,p t ,0 t ) where f t , p t and 6 t are minimizers of s t 



e -<*t jf y = h(x) 
4) Update the weights: w t+u =w,A a . t 



e a < if y^h t {x t ) 



where, 



a= — \x\ 

' 2 



V £ t J 



The final strong classifier is: 

f t 



H finals) = si S n 



3. Proposed Weak Classifiers 

This section describes the proposed weak classifiers which eliminate the need of explicit threshold 
for the Haar-like features. First we formulate the definition of the new weak classifiers based on 
Bayesian decision theory and quadratic discriminant analysis [16]. Later we discuss the motivation to 
use and the relative advantage of the proposed weak classifiers over the traditional single threshold 
weak classifiers. 

3.1. Bayesian Decision Rule 

Given a set of features, the Bayesian decision theory for classification requires decision boundaries 
that minimize the error rate on the training data. Let us consider a two class problem with a>\ and co 2 as 
the state of nature. If x is the observed feature value, the decision boundary that minimizes the 
classification error is given in terms of the posterior probabilities as P(a>\\x) = P(a>2\x). The 
corresponding decision rule is: decide a>\ if P(<x>\\x) > P(a>2\x); else decide a>2. 
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3.2. Discriminant Function for Normal Density 

One of the most useful ways to represent pattern classifiers is in terms of a set of discriminant 
functions gi(x); i = 1, 2,..., c, where c is the number of categories to discriminate. The classifier is said 
to assign a feature x to class gj, if: 

g i (x)>g ] (x) for all j*i (2) 

The effect of the decision rule is to divide the feature space into c decision regions. The regions are 
separated by decision boundaries, surfaces in the feature space where ties occur among the largest 
discriminant functions [16]. Assuming the distribution of the univariate Haar-like features to be 
normal, i.e., p(x\a>i) ~ N(jUi, £,•), the minimum error rate classification can be achieved by the use of 
discriminant function of the form given in Equation (3) [16]: 

gi (x) =—(.*- Mi Y 2: 1 (jc - /,.) -hn2x-hn I Z, I +ln P(co, ) (3) 

where P(co,) is the priori probability of class qj,-. Taking a general univariate normal case with different 
variances for each category, the resulting discriminant function is given as: 

g . ( x ) = x f W t x + W.x + w iQ (4) 

where: 

' 2 i 

w. =S7V- 
' i 'z 

w. n =--u t .Y~ l u. — lnlE. \+]nP(co) 
'° 2iii2 

The discriminant functions of Equation (4) are inherently quadratic. The decision surfaces are 
hyperquadrics and in one dimensional case the decision regions needn't be simply connected as shown 
in Figure 3. This observation motivates us to formulate new kind of weak classifiers without explicitly 
specifying the threshold for each weak classifier. 

Figure 3. Non- simply connected decision regions in one dimension. 
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3.3. Proposed Weak Classifiers 

The proposed weak classifiers are based on the quadratic discriminant functions described above. 
Each Haar-like feature from the pool of 91,620 features is evaluated on the training samples and 
one-dimensional probability densities for object and non-object classes are calculated. Assuming the 
density of each feature to be normal, the distributions of feature on the object and non-object classes 
are parameterized by their maximum likelihood estimators, i.e., mean ju and variance S. The 
distribution for the object (positive) class is p(x\co p ) ~ N(ju p , and for non-object (negative) class is 
p(x\co n ) ~ N(ju„, Z„). The decision regions for the two distributions are given from Equation (2), i.e., 
assign the observed feature value x to class co p if: 



This decision rule divides the feature space into decision regions which needn't be simply 
connected for the same class. The proposed weak classifiers for the Haar-like features are defined as: 



where x is the base resolution of the classifier and / is the Haar-like feature. Since a more general 
model of the distribution is considered, the proposed weak classifiers are expected to perform better 
than the single threshold weak classifier. 

For the weak classifiers of Equation (1), each feature produces a single scalar value and the decision 
boundary corresponds to a scalar threshold. But the choice of this threshold is not stated clearly in [2] 
and determination of an optimal threshold is a nontrivial task. The proposed weak classifiers of 
Equation (6) are more general and do not require any explicit representation of the threshold. In fact, 
the weak classifiers of Equation (1) are a special case of the proposed weak classifiers when Z p and E„ 
are identical. The weak classifiers based on single threshold commonly employ "average of means" of 
the two distributions, i.e., (up + |u n )/2, as decision threshold. Under this hypothesis, it is statistically 
observed that most of the Haar-like features are non-discriminative and inefficient for boosting. 

Figure 4. Typical distribution of feature values in later stages of boosting on the training 
data (described later). 
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The error rates of these single threshold weak classifiers selected at later stages of the boosting 
process become large as the sample distribution consists of samples which are difficult to discriminate 
as shown in Figure 4. The single threshold weak classifiers are not efficient in discriminating such 
distributions. The proposed weak classifiers are expected to efficiently discriminate the underlying 
distribution of Figure 4, as disjoint decision regions are also supported as shown in Figure 3. 

4. Experimental Results 

4.1. Data Preparation 

The experiments were carried out for detection of rear of cars. The experiments were done 
using 1,500 positive and 3,500 negative samples. The positive samples consisted of instances of rear of 
cars cropped from a video taken from a camera mounted at the front of a host car while driving in an 
urban environment. Each instance was resized to a base size of 24 x 18 pixels. The negative samples 
consisted of images cropped from random high resolution images that did not contain any instance of 
rear of car. Each negative sample was also resized to base size of 24 x 18 pixels. 1,000 positives 
and 3,000 negative samples were used for training the classifiers while the remaining 500 positive 
and 500 negative samples were used for validation. Figure 5 shows some of the positive and negative 
samples used for the experiment. 

Figure 5. Example of the rear-of-car images (left) and the non-car images (right) used for training. 
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4.2. Performance Comparison between Proposed Weak Classifiers and Single Threshold Weak 
Classifiers 

A single stage classifier was trained by AdaBoost on the training data using the proposed weak 
classifiers to achieve 100% hit rate on the positive samples and zero false positive on the negative 
samples. The final strong classifier achieved the required performance on the training data with a total 
of 69 proposed weak classifiers. The first weak classifier selected by AdaBoost yielded an error rate 
of 0.2. The subsequent selected weak classifiers yielded comparatively higher error rates. The worst 
error rate among the selected classifiers was 0.37 for the 66th classifier. The error rates of subsequent 
selected classifiers can be seen in Figure 6. Another strong classifier was trained on the same training 
data using the conventional single threshold based weak classifiers. These classifiers employed the 
average of means as the threshold. The final strong classifier required 225 weak classifiers to achieve 
similar performance on the training data. The error rate of the first selected weak classifier was 0.21 
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but it increased rapidly for the subsequent classifiers and the worst was 0.44 for the 208th classifier. A 
third strong classifier was trained on the same training data using single threshold based weak 
classifiers which employed Otsu's method [17] for optimal threshold selection. The final classifier 
consisted of 125 weak classifiers. The error rate of the first weak classifier was 0.22 and the highest 
error rate was 0.41 for the 124th classifier. Figure 6 shows that the error rates of the proposed weak 
classifiers are consistently lower than the single threshold based counterparts. 

Most of the features selected using the proposed weak classifiers have overlapping distributions of 
the object and non-object classes. Though these features have lower error rates and are boostable under 
the proposed hypothesis, they would have been rendered useless for boosting under the single 
threshold hypothesis. Some of the feature distributions are shown in Figure 7. 

Figure 6. Error rates of the weak classifiers selected by boosting using proposed weak 
classifiers and the single threshold weak classifiers. 
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Figure 7. Distributions of feature values of the 6th (left) and the 9th (right) features 
selected by AdaBoost using the proposed weak classifiers. The features have error rate 
of 0.27 and 0.3 respectively under the proposed hypothesis. The single threshold approach 
will reject these features as inefficient since a single threshold is not sufficient to 
discriminate these types of distributions which are unimodal or close to unimodal. 
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The selection of efficient classifiers at each round of boosting helps the learning algorithm to 
converge faster (in terms of the number of weak classifiers) on the training data as can be seen in 
Figure 8. Similar performance (in terms of hit rate and false positive rate) on the training data can be 
achieved with a significant reduction in the total number of weak classifiers by using the proposed 
weak classifiers over the conventional single threshold weak classifiers. 

Figure 8. Plot of training error and the number of weak classifiers: the proposed weak 
classifiers and the single threshold (derived from average of means and Otsu's optimal 
thresholding method) based weak classifiers. 
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Figure 9. ROC curves for one stage classifiers trained using the proposed weak classifiers 
and single threshold (derived from average of means and Otsu's optimal thresholding 
method) based weak classifiers. 
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To investigate the generalization performance of the proposed weak classifiers, the strong classifiers 
were tested on a validation dataset. The validation data set consisted of 500 positive and 500 negative 
sample images that were not used for training. ROC curves were generated using the validation dataset. 
The points in the ROC curves were obtained by evaluating each of the strong classifiers against the 
validation dataset by sliding the stage threshold from -10 to +10 at step of 0.25. The thresholds for the 
stage were chosen because varying the thresholds in this range proved to be sufficient to generate the 
whole range in the ROC curves. The plot of the hit rate versus the false alarm rate for all the methods 
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is given in Figure 9. From the ROC curves in Figure 9, we can see that the classifier trained using the 
proposed weak classifiers perform consistently better than the classifier trained using single threshold 
weak classifiers. The detection rate of the classifier based on the proposed weak classifiers is always 
higher than the classifier based on single threshold weak classifiers. And for a given detection rate, the 
classifier using the proposed weak classifier always has less false alarm rate than the classifiers using 
single threshold weak classifiers. The higher performance of the proposed method reflects the benefit 
of the usage of discriminant function based weak classifiers, which are more effective at discriminating 
car and non-car examples. 

4.3. Performance Comparison on Relatively Difficult Samples 

In this experiment, the classifiers were trained on relatively difficult samples than those of 
Section 4.2. The positive samples contained 500 positive images from the original training set. The 
negative samples contained 2,000 negative images. The negative samples were the false positives 
generated when a 10 stage cascade was evaluated on random high resolution images. The 10 stage 
cascade was trained on the original training set using the single threshold weak classifiers. In this 
sense, the negative samples are relatively difficult for the single threshold weak classifiers to 
discriminate. Three different classifiers were trained using the three types of weak classifiers on this 
data to achieve 100% hit rate and zero false positives. The classifier using the proposed weak 
classifiers required only 36 features whereas the classifier using single threshold weak classifier 
employing average of means required 236 features and the classifier employing Otsu's thresholding 
method required 90 features to achieve same performance on the training data. This shows that the 
proposed weak classifiers are equally efficient in discriminating difficult samples than the single 
threshold counterparts. The generalization performance of the trained classifiers was tested on the 
validation set as in Section 4.2. The ROC curves in Figure 10, generated against the validation dataset 
show significant performance improvement of the classifier trained using the proposed weak classifiers 
over the classifier using single threshold weak classifiers. 

Figure 10. ROC curves for single stage classifier trained on difficult samples using the 
proposed weak classifiers and the single threshold weak classifiers. Single stage classifiers 
were trained on the negative samples acquired as false positives of an already 
trained 10 stage cascade using the single threshold based weak classifiers. 
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4.4. Comparison of the Speed of the Object Detector 

The speed of a cascaded object detector is directly proportional to the number of features evaluated 
per scanned sub-window in the image. To compare the speed of the detector, we trained three 15 stage 
cascade using the single threshold classifiers and the proposed weak classifiers on the UIUC Image 
database for car detection. The cascades were evaluated on the UIUC car test image set at different 
scales. For the classifier with single threshold employing average of means, an average of 14.5 features 
out of the total 248 features and for the one employing Otsu's optimal threshold an average 
of 13 features out of the total of 230 were evaluated per sub-window, whereas for the proposed 
classifiers only an average of 8 features out of the total 131 features were evaluated per sub-window. 
Table 1 shows the total number of sub-windows scanned and the total features evaluated for seven 
images randomly sampled from the UIUC car test images at different scales. The feature value 
calculation time for the proposed classifier is the same as that for the single threshold Haar-like 
features. But from Equation (3) we see that the proposed weak classifier requires additional 
multiplication and addition operation to make the class decision. This makes them relatively more 
expensive to compute than the single threshold classifiers. The experiments conducted show that the 
proposed weak classifier requires around 1.6 times more computation time than the single threshold 
classifier to make a class decision. However as seen from Table 1, the single threshold classifiers need 
to evaluate on average around 1.6 times more features per sub-window than the proposed weak 
classifiers. This makes the speed of the proposed detector comparable to that of the conventional single 
threshold based detector. 



Table 1. Comparison of the speed of the detectors in terms of the average number of 
features evaluated per scanned window in the test images. 
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Single Threshold Method 
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1 


63,313 


521,545 


8.23 


868,037 


13.7 


790,047 


12.4 


2 


63,618 


541,644 


8.51 


929,962 


14.61 


859,328 


13.5 


3 


85,106 


692,548 


8.13 


1,173,507 


14.9 


1,032,627 


12.13 


4 


87,378 


763,826 


8.74 


1,302,362 


14.9 


1,122,984 


12.85 
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40,810 


366,326 


8.97 


653,903 


16.3 


620,030 


15.19 
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82,354 


688,846 


8.36 


1,078,020 


13.1 


1,052,336 


12.77 


7 


58,590 


492,783 


8.41 


842,548 


14.38 


753,639 


12.86 



5. Conclusions 

In this paper, we have proposed a new set of weak classifiers for efficient boosting. The proposed 
weak classifiers do not require an explicit decision threshold to be calculated as is required for the 
single threshold weak classifiers and present a general solution for the optimal threshold finding 
problem. The proposed quadratic discriminant analysis based solution significantly improves the 
ability of the weak classifiers to discriminate object and non-object classes. The experimental results 
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demonstrate that the proposed weak classifiers have far less classification error rate than the single 
threshold weak classifiers. An object detector trained using the proposed weak classifiers using 
AdaBoost facilitated efficient boosting and the final classifier yielded higher classification 
performance with less number of weak classifiers than a detector built with traditional single threshold 
weak classifiers. 
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